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Statistical modeling of nuclear data provides a novel approach to nuclear systematics complemen- 
tary to established theoretical and phenomenological approaches based on quantum theory. Contin- 
uing previous studies in which global statistical modeling is pursued within the general framework 
of machine learning theory, we implement advances in training algorithms designed to improved 
generalization, in application to the problem of reproducing and predicting the halflives of nuclear 
ground states that decay 100% by the f3~ mode. More specifically, fully-connected, multilayer feed- 
forward artificial neural network models are developed using the Levenberg-Marquardt optimization 
algorithm together with Bayesian regularization and cross-validation. The predictive performance 
of models emerging from extensive computer experiments is compared with that of traditional mi- 
croscopic and phenomenological models as well as with the performance of other learning systems, 
including earlier neural network models as well as the support vector machines recently applied to 
the same problem. In discussing the results, emphasis is placed on predictions for nuclei that are 
far from the stability line, and especially those involved in the r-process nucleosynthesis. It is found 
that the new statistical models can match or even surpass the predictive performance of conven- 
tional models for beta-decay systematics and accordingly should provide a valuable additional tool 
for exploring the expanding nuclear landscape. 

PACS numbers: 23.40.-s, 21.10.Tg, 26.30.+k, 07.05.Mh, 98.80.Ft 



I. INTRODUCTION 

"Numbers are the within of all things. " 
Pythagoras of Samos 

This work is devoted to the development of artificial 
neural network models which, after being trained with 
a subset of the available experimental data on beta de- 
cay from nuclear ground states, demonstrate significant 
reliability in the prediction of [3~ halflives for nuclides 
absent from the training set. The work represents an ex- 
ploratory study of the degree to which the existing data 
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determines the mapping from proton and neutron num- 
bers to the corresponding f3~ halflife. 

There is an urgent need among nuclear physicists and 
astrophysicists for reliable estimates of j3~ -decay halflives 
of nuclei far from stability [J HJ . Among nuclear physi- 
cists this need is driven both by the experimental pro- 
grams of existing and future radioactive ion beam facil- 
ities and by the stresses placed on established nuclear 
structure theory as totally new areas of the nuclear land- 
scape are opened for exploration. For nuclear astrophysi- 
cists, such information is intrinsic to an understanding of 
supernova explosions - the initialization of the explosion, 
the subsequent neutronization of the core material, and 
the strength and fate of the shock wave formed - and the 
nucleosynthesis of heavy elements above Fe, notably the 
r-process 0, 0, H[ • Both the element distribution on the 
r-path and the time scale of the r-process are highly sen- 
sitive to the /3-decay properties of the neutron-rich nuclei 
involved. 
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In the nuclear chart there are spaces for some 6000 nu- 
clides between the /3-stability line and the neutron-drip 
line. Except for a few key nuclei, (3 decay of r-process 
nuclei cannot be studied in terrestrial laboratories, so 
the required information must come from nuclear mod- 
els. Over the years, a number of approaches for mod- 
eling of /3 _ -decay halflives have been proposed and ap- 
plied. These include the more phenomenological treat- 
ments, such as the Gross Theory (GT), as well as mi- 
croscopic approaches based on the shell model and the 
proton-neutron Quasiparticle Random-Phase Approxi- 
mation (pnQRPA) in various versions. More recently, 
hybrid macroscopic-microscopic and relativistic models 
have come on the scene. Some of these approaches em- 
phasize only global applicability, while others seek self- 
consistency or comprehensive inclusion of nuclear corre- 
lations. Table 1 of Ref. [6j provides a convenient summary 
of a number of the competing models of beta-decay sys- 
tematics. 

In Gross Theory, developed by Takahashi, Yamada and 
Kondoh 0, gross properties of /3~ decay over a wide 
nuclidic region are predicted by averaging over the fi- 
nal states of the daughter nucleus. Subsequently, vari- 
ous refinements and modifications of this treatment have 
been introduced. The most current of these is the so- 
called Semi-Gross Theory (SGT), in which the shell ef- 
fects of only the parent nucleus are taken into account Q . 
On the other hand, in the calculations of (3~ -decay 
halflives within the shell model, the detailed structure 
of /3 strength function is considered. Results exist for 
lighter nuclei and nuclei at N = 50, 82, and 126. (See 
Refs. IqL fiol for recent calculations.) Due to the limits set 
by the size of the configuration space, calculations are 
not possible for heavy nuclei. 

Several groups have carried out extensive pnQRPA 
studies including pairing. Efforts along this line by 
Klapdor and co-workers [111 ] began in the framework of 
the Nilsson single-particle model, including the Gamow- 
Teller residual interaction in Tamm-Dancoff approxi- 
mation (TDA), with pairing treated at the BCS level 
. This approach has been complemented and re- 
fined by Staudt et al. jl3] and Hirsch et al. [13], us- 
ing pnQRPA with the Gamow- Teller residual interac- 
tion. The later study by Homma et al. [IB], denoted 
NBCS + pnQRPA, includes a schematic interaction also 
for the first-forbidden (ff) decay. The Klapdor group 
has extended the pnQRPA theory to calculate /3-decay 
halflives in stellar environments using configurations be- 
yond lp—lh [HI]. 

The starting point of the /3-decay calculations of 
Moller and co-workers is the study of nuclear-ground- 
state masses and deformations based on the finite- 
range droplet model (FRDM) and a folded- Yukawa 
single-particle potential [13] • The /3-decay halflives 
for the allowed Gamow- Teller transitions have been 
obtained from a pnQRPA calculation after the addi- 
tion of pairing and Gamow- Teller residual interactions, 
in a procedure denoted FRDM + pnQRPA [H, QjJ. 



In the latest calculations the effect of the ff decay 
has been added by using the Gross Theory (pnQRPA 
+ffGT) [13]. Non-relativistic pnQRPA calculations 
that aim at self-consistency include the Hartree-Fock- 
Bogoliubov + continuum QRPA (HFB + QRPA) cal- 
culations performed with a Skyrme energy-density func- 
tional for some spherical even-even semi-magic nuclides 
with N = 50,82,126 [Hj]. The extended Thomas-Fermi 
plus Strutinski integral method (ETFSI) (an approxima- 
tion to HF method based on a Skyrme-type force plus 
a 5— function pairing force) has been elaborated and ap- 
plied to large-scale predictions of /3~ halflives (22J. Re- 
cently, the density functional + continuum QRPA (DF + 
CQRPA) approximation, with the spin-isospin effective 
NN interaction of the finite Fermi system theory operat- 
ing in the ph channel, has been developed for ground- 
state properties and Gamow- Teller and ff transitions 
of nuclei far from the stability line, and applied near 
closed neutron shells at N — 50, 82, 126 and in the region 
"east" of 208 Pb @, [H[. In the relativistic framework, 
a pnQRPA calculation (pnRQPRA) based on a rela- 
tivistic Hartree-Bogoliubov description of nuclear ground 
states with the density-dependent effective interaction 
DD-MEI* has been employed to obtain Gamow- Teller 
/3~-dccay halflives of neutron-rich nuclei in the N ~ 50 
and N ~ 82 regions relevant to the r-process [24]. Re- 
cently, an extension of the above framework to include 
momentum-dependent nucleon self-energies was applied 
in the calculation of /3-decay halflives of neutron-rich nu- 
clei in the Z ~ 28 and Z ~ 50 regions [251 ]. 

Despite continuing methodological improvements, the 
predictive power of these conventional, "theory-thick" 
models is rather limited for /3~-decay halflives of nuclei 
that are mainly far from stability. The predictions of- 
ten deviate from experiment by one or more orders of 
magnitude and show considerable sensitivity to quanti- 
ties that are poorly known. In this environment, sta- 
tistical modeling based on advanced techniques of sta- 
tistical learning theory or "machine-learning," notably 
artificial neural networks ( AN Ns) [26l . [27j and support 
vector machines (SVMs) [23, [H, [2^] , offers an interesting 
and potentially effective alternative for global modeling 
of /3 _ -decay lifetimes. Such approaches have proven their 
value for a variety of scientific problems in astronomy, 
high-energy physics, and biochemistry that involve func- 
tion approximation and pattern classification [30l l3l| . 
Statistical modeling implementing machine-learning al- 
gorithms is "theory-thin," since it is driven by data with 
minimal guidance from mechanistic concepts; thus it is 
very different from the "theory-thick" approaches sum- 
marized above. Any nuclear observable X can be viewed 
as a mapping from the atomic and neutron numbers Z 
and ./V identifying an arbitrary nuclide, to the corre- 
sponding value of the observable (the (3 halflife, in the 
present study). In machine learning, one attempts to 
approximate the mapping (Z, N) — > X based only on 
an available subset of the data for X, i.e., a body of 
training data consisting of known examples of the map- 
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ping. One attempts to infer the mapping, in the sense of 
Bayesian probability theory as expounded by Jaynes [12] • 
Thus, one is asking the question: "To what extent does 
the data, and only the data, determine the mapping 
(Z,N) — > A?" The answer (or answers) to this ques- 
tion should surely be of fundamental interest, when con- 
fronted with databases as large, complex, and refined as 
those existing in nuclear physics. 

A learning machine consists of (i) an input interface 
where, for example, input variables Z and N are fed to 
the device in coded form, (ii) a system of intermediate el- 
ements or units that process the input, and (iii) an output 
interface where an estimate of the corresponding observ- 
able of interest, say the beta halflife Tp appears for decod- 
ing. Given an adequate body of training data (consisting 
of input "patterns" or vectors and their appropriate out- 
puts), a suitable learning algorithm is used to adjust the 
parameters of the machine, e.g., the weights of the con- 
nections between the processing elements in the case of a 
neural network. These parameters are adjusted in such 
a way that the learning machine (a) generates responses 
at the output interface that closely fit the halflives of the 
training examples and (b) serves as a reliable predictor 
of the halflives of the test nuclei absent from the training 
set. In the more mundane language of function approx- 
imation, the learning-machine model provides a means 
for interpolation or extrapolation. 

Neural-network models have already been constructed 
for a range of nuclear properties including atomic masses, 
neutron separation energies, ground state spins and pari- 
ties, and branching probabilities for different decay chan- 
nels, as well as /3~-decay halflives [30l. l3ll. l33l. l34l. l35l |36| . 
Very recently, global statistical models of some of these 
properties have also been developed based on support 
vector machines [13, US HI|- In time, there has been 
steady improvement of the quality of these models, such 
that the documented performance of the best exam- 
ples approaches or even surpasses that of the traditional 
"theory-thick" models in predictive reliability. By their 
nature, they should not be expected to compete with tra- 
ditional phenomenological or microscopic models in gen- 
crating new physical insights. However, their prospects 
for revealing new regularities are by no means sterile, 
since the explicit formula created by the learning algo- 
rithm for the physical observable being modeled is avail- 
able for analysis. 

We present here a new global model for the halflives 
of nuclear ground states that decay 100% by the f3~ 
mode, developed by implementing the most recent ad- 
vances in machine-learning algorithms. Sec. II describes 
the elements of the model, the training algorithm em- 
ployed, steps taken to improve generalization, the data 
sets adopted, and the coding schemes used at input and 
output interfaces. Performance measures for assessing 
the quality of global models of beta lifetimes are reviewed 
in Sec. III. The results of our large-scale modeling studies 
are reported and evaluated in Sect. IV. Detailed compar- 
isons are made with experiment, with a selection of the 
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FIG. 1: Architecture of a typical fully connected feedforward 
network having an input layer with three units, two hidden 
layers each containing five units, and a single output unit, 
thus of structure [3 — 5 — 5 — 1 156]. 



theory-driven GT and pnQRPA global models, and with 
previous ANN and SVM models. This assessment is fol- 
lowed by the presentation of specific predictions for nuclei 
that are situated far from the line of stability, focusing 
in particular at those involved in r-process nucleosynthe- 
sis. Finally, Sect. V summarizes the conclusions of the 
present study and considers the prospects for further im- 
provements in statistical prediction of halflives. 



II. THE MODEL 
A. Network Architecture and Dynamics 

Artificial neural networks, whose structure is inspired 
by the anatomy of natural neural systems, consist of in- 
terconnected dynamical units (sometimes called neurons) 
that are typically arranged in a distinct layered topology. 
Also in analogy with biological neural systems, the func- 
tion of the network, for example pattern recognition, is 
determined by the connections between the units. In 
the work to be reported, we have focused exclusively on 
feedforward networks, in which information flows unidi- 
rectionally from an input layer through one or more inter- 
mediate (hidden) layers to an output layer. Lateral and 
feedback connections are absent, but otherwise the net- 
work is fully connected. The activation of hidden units 
is nonlinear, whereas the output units transform their 
inputs linearly. The architecture of such a network is 
indicated by the notation 



[7 - H x - H 2 



H L -0\W] 



(1) 



where I is the number of inputs, Hi is the number of neu- 
rons in the i th hidden layer, O is the number of units in 
the output layer, and W is the total number of parame- 
ters needed to complete the specification of the network, 
consisting of the weights of the connections and the bi- 
ases of the units. Fig. [T] depicts a typical fully connected 
network of the class used in our statistical modeling, in 
this case having architecture [3 — 5 — 5 — 1 j 56 ] . 
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The connection from neuron j to neuron i carries a 
real-number weight . Thus, if Oj is the activity of 
neuron j, it provides an input WijOj to neuron i. In 
addition, each neuron i is assigned a bias parameter bi, 
which is summed together with its input signals from 
other neurons j to form its total input ui. This quantity 
is fed into the activation function <pi characterizing the 
response of neuron i. For the neurons in hidden layers, 
this function is taken to have the nonlinear hyperbolic 
tangent form 



tp(u) = 



1 + cxp (— 2u) 



-1, 



(2) 



while for the neurons in the output layer the symmetric 
saturating linear form 



ip(u) 



-1, 

u, 

I: 



u < — 1 
-1 < u < 1, 
u > 1 



(3) 



is adopted. The output (or activity) ot of neuron i is given 
by 



WijOj 



(4) 



We note that with its sign reversed, a neuron's bias can 
be viewed as a threshold for its activation. Also, it is 
sometimes convenient to regard the bias bi as the weight 
of a connection to neuron i from a virtual unit v that is 
always fully "on", i.e., o v = 1. The weights aiy and biases 
6^ are adjustable scalar parameters of the untrained net- 
work, available for optimization of the network's perfor- 
mance in some task, notably classification and function 
approximation in the case of applications to global nu- 
clear modeling. This is usually done by minimizing some 
measure of the errors made by the network in response 
to inputs corresponding to a set of training examples, or 
"training patterns." 

The dynamics of the network is exceptionally simple. 
When a pattern p is presented at the input, the system 
computes a response according to two rules: 

(a) The states of all neurons within a given layer, as 
specified by the outputs o; of Eq. are updated in 
parallel, and 

(b) The layers are updated successively, proceeding 
from the input to the output layer. 

In modeling the systematics of beta lifetimes with this 
approach, we apply a supervised learning algorithm to 
optimize the weights and biases, as described in the sub- 
sections to follow. The patterns p to be learned or pre- 
dicted, examples of the mapping from nuclide to lifetime, 
take the form 



Zp 



lo SlO T iexp 



(5) 



and thus consist of an association between the atomic and 
neutron numbers of the parent nuclide, with the base-10 



log of the experimental halflife Tp exp . It is of course nat- 
ural to work with the logarithm of Tp, since the observed 
values of Tp itself vary over many orders of magnitude. 

According to the nature of statistical estimation, real- 
ized here in the application of machine learning tech- 
niques to function approximation, a neural network 
model is only one form in which empirical knowledge of 
a physical phenomenon of interest {(3 decay in this case) 
may be encoded [27[- As indicated in the introduction, 
the present work is at some level an investigation of the 
degree to which the available data determines the physi- 
cal mapping from Z and N to the corresponding /3-decay 
halflife. Actually, we do not have knowledge of the exact 
functional relationship involved. Thus we should write 



log w Tp(Z,N)=g(Z,N)+e(Z,N), 



(6) 



where g(Z, N) is a function that decodes the decay sys- 
tematics and e is a random expectation error - a Gaus- 
sian noise term that represents our ignorance about the 
dependence of TJg on Z and N . From a heuristic per- 
spective beyond strict mathematical definitions, this e 
noise term could reflect "chaotic" influences on the phe- 
nomenon, along with missing regularities that could be 
more easily modeled and eventually included in the esti- 
mate of the physical quantity Tp. 

The pragmatic objective of the training process in this 
application will be to minimize the sum of squared errors 
e p committed by the network model relative to experi- 
ment, for the n patterns p from the available experimen- 
tal data (D) that constitute the training set 



■it it 



ale 



P =i 



P =i 



(7) 



Here log 10 Ta' calc is the neural-network output for pat- 
tern (nuclide) p, whereas ^ogi T^ ^ s * ne target output. 
This quantity is often referred to as a cost function or ob- 
jective function and can obviously be used as a measure of 
network performance. In practice, its form will be modi- 
fied in Subsec. C.2 below so as to improve the network's 
ability to generalize, or predict. A network model is said 
to generalize well if it performs well for inputs (nuclides) 
outside the training set, with the mean-square error for 
these "fresh" nuclei providing an appropriate measure of 
predictive performance. 



B. The Training Algorithm 

In supervised learning, the network is exposed, in suc- 
cession, to the input patterns (nuclides) of the training 
set, and the errors made by the network are recorded. 
One pass through the training set is called an epoch. In 
batch training, weights and biases are incremented after 
each epoch according to a suitable learning algorithm, 
with the expectation of improving subsequent perfor- 
mance on the training set. 
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Statistical modeling inevitably involves a tradeoff be- 
tween closely fitting the trainin g da ta and reliability in 
interpolation and extrapolation [27L l28l|. In the present 
application, it is not the goal of network training to 
achieve an exact reproduction, by the model, of the 
known halflives. This would necessarily entail fitting the 
data precisely with a large number of parameters - which 
would in general require a complex ANN with many lay- 
ers and/or neurons/layer. Obviously, there is no point in 
constructing a lookup table of the known beta halflives. 
Rather, the goal is to achieve an accurate representation 
of the regularities inherent in the training data by means 
of a network that is no more complicated than it need 
be, thereby promoting good generalization. 

We employ a training algorithm within the general 
class of backpropagation learning procedures. There 
are now quite a number of well-tested procedures 
in this class, including steepest-descent, conjugate- 
gradient, Newton, and Levenberg-Marquardt training al- 
gorithms [26j]. All of these approaches aim to minimize 
an appropriate cost function with respect to the network 
weights and biases. The term backpropagation refers to 
the process by which derivatives of network errors with 
respect to weights/biases can be computed starting from 
the output layer and proceeding backwards toward the 
input. In general, the Levenberg-Marquardt backprop- 
agation (LMBP) algorithm will have the fastest conver- 
gence in function approximation problems, an advantage 
that is especially noticeable if very accurate training is 
required [4fj |. 

In the Newton method, minimization of the cost func- 
tion is accomplished through the update rule 

w fc+ i = w fc -H^gfe, (8) 

where is the vector formed from the weights and bi- 
ases, Hfc is the Hessian matrix (the matrix of second 
derivatives of the objective function Ejj with respect 
to the weights and biases) and gfc is the gradient of 
Ed at the current epoch k. As a Newton-based proce- 
dure attempting to approximate the Hessian matrix, the 
Levenberg-Marquardt algorithm [26|, |4l| was designed to 
approach second-order training speed without having to 
compute second derivatives. When the cost function has 
the form of Eq. ([7]) , the Hessian matrix for nonlinear net- 
works can be approximated as 

H«J T J, (9) 

where J is the Jacobian matrix composed of the first 
derivatives of the network errors with respect to the 
weights/biases. This generates a W x W matrix, where 
W is the number of the free parameters (weights and bi- 
ases) of the network. The gradient g can be computed 
as 

g = J T e, (10) 

where e is the vector whose components are the network 
errors e p . (As in Eq. ([7]), the network error for a given in- 



put pattern is the target value of the estimated quantity, 
minus the value produced by the network.) 

Adopting the Gauss-Newton approximation the 
Levenberg-Marquardt algorithm then adjusts the weights 
according to the Newton-like updating rule 

w fe+ i = w k - [J^Jfe + /ifcl] 1 Jfe e fe , (11) 

where I is the unit matrix. 

The factor /i£ appearing in the Eq. (jlip is an ad- 
justable parameter that controls the step size so as to 
quench oscillations of the cost function near its mini- 
mum. When Hk is very small, LMBP coincides with the 
Newton method executed with the approximate Hessian 
matrix. When is large enough, matrix g in Eq. (|10p is 
nearly diagonal and the algorithm behaves like a steepest- 
descent method with a small step size. Steepest-descent 
algorithms are based on linear approximation of the cost 
function, while the Newton algorithm involves quadratic 
approximation. Newton's method is faster and more ac- 
curate near an error minimum. Therefore the preferred 
strategy is to shift toward Newton's method as quickly as 
possible. To this end, fit- is decreased after each success- 
ful step and is increased only when a tentative step would 
raise the cost function. In this way, the cost function will 
always be reduced at each iteration of the algorithm. The 
algorithm begins with fik set to some small value (e.g., 
Hk = 0.01). If a step does not yield a smaller value for 
the cost function, the step is repeated with /Zfc multiplied 
by some factor 9 > 1 (e.g., 8 = 10). Eventually the cost 
function should decrease. If a step does produce a smaller 
value for the cost function, then fik is divided by 9 for 
the next step, so that the algorithm will approach Gauss- 
Newton, which should provide faster convergence. Thus, 
the Levenberg-Marquardt algorithm is advantageous in 
implementing a favorable compromise between slow but 
guaranteed convergence far from the minimum and a fast 
convergence in the neighborhood of the minimum. 

The key step in LMBP algorithm is the computation 
of the Jacobian matrix. To perform this computation 
we use a variation of the classical backpropagation algo- 
rithm. In the standard backpropagation procedure, one 
computes the derivatives of the squared errors with re- 
spect to the weights and biases of the network. To create 
the Jacobian matrix we need to compute the derivatives 
of the errors, instead of the derivatives of their squares, 
a trivial difference computationally. 



C. Improving Generalization 

To build a viable statistical model, it is imperative 
to avoid the phenomenon of overfitting, which for ex- 
ample occurs when, under excessive training, the net- 
work simply "memorizes" the training data and makes a 
lookup table. Such a network fails to learn the regulari- 
ties of the target mapping that are inherent in the data; 
the network is therefore deficient in generalization. We 
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seek to avoid overfitting through a combination of well- 
established techniques, namely cross-validation [2jj and 
Bayesian regularization [42]. 



1. Cross-Validation 

Cross-validation is a standard statistical technique 
based on dividing the data into three subsets [13 ■ The 
first subset is the learning or training set employed in 
building the model (i.e., in computing the Jacobians and 
updating the network weights and biases). The second 
subset is the validation set, used to evaluate the perfor- 
mance of the model outside the training set and guide the 
choice of model. The error on the validation set is mon- 
itored during the training process. When the network 
begins to overfit the data, the error on the validation 
set will typically begin to rise. If this continues to oc- 
cur for a specified number of iterations, the training is 
stopped, and the weights and biases at the minimum of 
the validation error are reinstated. The third subset is 
the test set. The error on the test set is not used dur- 
ing the training procedure, but it is used to assess the 
generalization performance of the model and to compare 
different models. While effective in suppressing overfit- 
ting, cross-validation tends to produce networks whose 
response is not sufficiently smooth. This is dealt with by 
performing Bayesian regularization together with cross- 
validation. 



2. Bayesian regularization 

The standard Levenberg-Marquardt algorithm aims 
to reduce the sum of squared errors Ed, written ex- 
plicitly in Eq. ([7]) for the /3-decay problem. How- 
ever, in the framework of Bayesian regularization [42l |. 
the Levenberg-Marquardt optimization (backpropaga- 
tion) algorithm (denoted LMOBP) minimizes a linear 
combination of squared errors and squared network pa- 
rameters, 



F = (3E D +aE 



<w, 



(12) 



where Ew is the sum of squares of the network weights 
(including biases). The multipliers a and f3 are hyperpa- 
rameters defined by 



Ik 



2E- 



and 



w 



fo- 



lk 



2E 



D 



where 



7fe =W-2a-tr(H fe )- 



(13) 



(14) 



is the number of parameters (weights and biases) that are 
being effectively used by the network, n is the number of 
errors, W is the total number of parameters character- 
izing the network model (See Eq. (TT|)) and H = V 2 F is 



the Hessian matrix evaluated for the extended ("regu- 
larized") objective function (|TS|) . The full Hessian com- 
putation is again bypassed using the Gauss-Newton ap- 
proximation, writing 

H fe = p k V 2 E D +a k V 2 E w « 2/3 fc j£j fc + 25*1. (15) 

Thus, the Levenberg-Marquardt optimization algorithm 
updates the weights/biases by means of the rule 



/3/,-JfcJ/c + (fx k + a k )l 



U3 k Jje fc + <S fc Wfe 



w fe+ i = w k - 

(16) 

A detailed discussion of the use of Bayesian regularization 
in combination with the Levenberg-Marquardt algorithm 
can be found in Ref. [43l 



D. Training Mode 

Backpropagation learning, as a technique for iterative 
updating of network parameters, can be executed in ei- 
ther the batch or pattern-by-pattern (or "on-line" ) mode. 
In the on-line mode, a pattern is presented to the net- 
work and its response recorded; the Jacobian matrix is 
then computed and the weights/biases updated before 
the next pattern is presented. In the batch mode, on the 
other hand, calculation of the Jacobian and parameter 
updating is performed only after all training examples 
have been presented, i.e., at the end of each epoch. The 
model results reported here are based on the batch mode, 
the choice being made on the empirical basis of findings 
from a substantial number of computer experiments car- 
ried out with both strategies. 



E. Data Sets 

The experimental data used in developing ANN mod- 
els of /3-decay systematics have been taken from the 
Nubase2003 evaluation [44[ of nuclear and decay proper- 
ties carried out by Audi et al. at the Atomic Mass Data 
Center. Restricting attention to those cases in which the 
ground state of the parent decays 100% through the (3~ 
channel, we form a subset of the beta-decay data denoted 
by NuSet-A, consisting of 905 nuclides sorted by halflife. 
The halflives of nuclides in this set range from 0.15 x 10 -2 
s for 35 Na to 2.43 x 10 23 s for 113 Cd. Of these NuSet-A 
nuclides, 543 (60%) have been chosen, at random with 
a uniform probability, to form the training set, and 181 
(20%) of those remaining have been similarly chosen to 
form the validation set. The residual 181 (20%) are re- 
served for testing the predictive capability of the models 
constructed. Such partitioning of the NuSet-A database 
(uniform selection) was implemented to ensure that the 
distribution over halflives in the whole set is faithfully 
reflected in the learning, validation, and test sets. Fig. 2 
shows an example of the results of this procedure, as 
viewed in the Z — N diagram. 
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FIG. 2: The partitioning of the whole set of halfiives in the learning, validation, and test sets as a function of the atomic (Z) 
and the neutron (N) numbers. Stable nuclides are also indicated. 




FIG. 3: Distribution of halfiives over the timescale for NuSet- 
A nuclides. NuSet-B nuclides lie to the right of the vertical 
gray rectangle. 



in Fig. 3), one is then dealing with a more homogeneous 
collection of nuclides, a property that facilitates the train- 
ing of network models. Accordingly, we have focused our 
efforts on NuSet-B. Table (VTlTI gives information on the 
distribution of NuSet-B nuclides with respect to the even 
versus odd character of Z and N. 

When considering the performance of a network model 
for examples taken from the whole data set (whether 
NuSet- A or NuSet-B) , we speak of operation in the Over- 
all Mode. Similarly, we speak of operation in the Learn- 
ing, Validation, and Prediction Modes when studying 
performance on the learning, validation, and test sets, 
respectively. 



F. Coding Schemes at Input and Output 
Interfaces. 



We also formed a more restricted data set, called 
NuSet-B, by eliminating from NuSet-A those nuclei hav- 
ing halflife greater than 10 6 s. The halfiives in this subset, 
which consists of 838 nuclides, range from 0.15 x 10~ 2 s 
for 35 Na to 0.20 x 10 6 s for 247 Pu. Histograms depict- 
ing the lifetime distribution of the NuSet-B nuclides are 
shown in Fig. [31 having made a uniform subdivision of the 
data into learning, validation, and test sets, consisting re- 
spectively of 503 (~ 60%), 167 (~ 20%), and 168 (~ 20%) 
examples. Having excluded the few long-lived examples 
from NuSet-A (situated to the right of the vertical line 



In our initial experiments in the design of ANN models 
for /3-decay halflife prediction, we employed input coding 
schemes that involve only the proton number Z and the 
neutron number N. To keep the number of weights to 
a minimum, we make use of analog (i.e., floating-point) 
coding of Z and N through two dedicated inputs, whose 
activities represent scaled values of these variables. The 
LMOBP algorithm works better when the network in- 
puts and targets are scaled to the interval [—1,1] than 
(say) the interval [0,1] (26|. Moreover, the range of the 
hyperbolic tangent activation function employed by the 
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hidden units lies in the interval — 1 < f{u) < 1. The 
ranges [0,230] and [0,230] of Z and N are therefore scaled 
to this interval. The base-10 log of the (3~ halflife Tp tC& \ c , 
as calculated by the network for input nuclide (Z p ,N p ), is 
represented by the activity of a single analog output unit. 
For the same reason as indicated for the input units, the 
range [0.17609, 8.9771] of the target values log 10 T? is 
scaled again to the interval [-1,1]. 

Also in the primary stages of our study of beta-halflife 
systematics, we have assumed that the halflife of a given 
nucleus is properly given by an expression of the form of 
Eq. ©. Such an expression echos the essence of Weiz- 
sacker's semi-empirical mass formula based on the liquid- 
drop model, with the binding energy given by a function 
B(Z,N) representing a statistical estimate of the physi- 
cal quantity, plus an additive noise term. 

Taking Z and N as the only inputs to the inference 
machine formed by the neural network has, of course, 
the logistical advantage that there is no limitation to the 
range of prediction of nuclear properties across the nu- 
clear landscape. If, on the other hand, such quantities as 
Q- values and neutron separation energies were included 
as inputs, one would have to calculate these quantities 
for choices of (Z, N) at which experimental values are not 
available. But this implies a departure from the "ideal" 
of determining the physical mapping from (Z, N) to the 
target nuclear property, based only on the existing body 
of experimental data for that property. The predictions 
of the network model would necessarily be contingent on 
some theoretical model to provide the additional values 
of the input quantities. 

However, estimating a given nuclear property - the log 
lifetime of beta decay in the present case - as a smooth 
function of Z and N has clear limitations. The nuclear 
data itself sends strong messages of the importance of 
pairing and shell effects ("quantal effects") associated 
with the integral nature of Z and N. The problem of 
atomic masses provides the classic example: the liquid 
drop formula must be supplemented by pairing and shell 
corrections to account for the existence of different mass 
surfaces for even-even, odd-A, and odd-odd nuclei and 
other effects of the integral/particulate character of Z 
and N. 

Examination of results from the simple coding scheme 
with Z and N alone serving as analog inputs is never- 
theless instructive. We have applied the LMBP train- 
ing algorithm to develop a network model with architec- 
ture [2-5-5-5-5-1 1111 ]. As shown in Fig.H the 
model yields a smooth curve that represents a gross fit 
of the experimental data involved. The predictive abil- 
ity of the model naturally relies on extrapolation based 
on this curve. These results demonstrate the need for 
a more refined model within which quantal effects such 
as pairing and shell structure are given an opportunity 
to exert themselves, so that the natural fluctuations are 
followed in validation and prediction modes, as well as in 
the learning (or "fitting") phase. 

A straightforward modification of the input interface 



of the network model that can at least partially fulfill 
this need is suggested by the extension of the liquid-drop 
model to include a pairing-energy term. In addition to 
the two input units representing Z and N as floating- 
point numbers, we introduce a third input unit repre- 
senting a discrete parameter analogous to the pairing 
constant, namely 

for e — e nuclei, 
0, for o — mass nuclei, (17) 
— 1, for o — o nuclei, 

which distinguishes between even-Z-even-iV, odd-A, and 
odd-Z-odd-A nuclides. This simple refinement has 
the conceptual advantage of remaining in the spirit of 
"theory-thin" modeling, driven purely by data rather 
than data plus physical intuition and accepted theory. 
All that is required is the knowledge that Z and N are 
actually integers and recognition of their even or odd par- 
ity. The expression replacing Eq. ([6]) as a representation 
of the inference process performed by the ANN model is 
evidently 

log 10 Tp(Z, N) = g(Z, N, S) + e(Z, N). (18) 

We shall see that some shell effects that might impact 
the behavior of halflives for both allowed and/or forbid- 
den transitions can, at least to some extent, be taken 
into account by the 6 input defined in Eq. (fT7|) . It should 
be mentioned that in the ANN global models of nuclear 
mass excess [35| , it has proven advantageous to introduce 
two binary input units that encode the even/odd parity 
of Z and N. 



G. Initialization of Network Parameters 

Proper initialization of the free parameters of the ANN 
- its weights and biases - is a very important and highly 
nontrivial task. One needs to choose an initial point on 
the error surface defined by Eqs. (0), (TT2"|) as close as 
possible to its global minimum with respect to these pa- 
rameters, and such that the output of each neuronal unit 
lies within the sensitive region of its activation function <p. 
We adopt a method devised by Nguyen and Widrow [46| , 
in which the initial weights are selected so as to distribute 
the active region of each neuron (its "receptive field" neu- 
robiological parlance) approximately evenly across the in- 
put space of the layer to which that neuron belongs. The 
Nguyen- Widrow method has clear advantages over more 
naive initializations in that all neurons begin operating 
with access to good dynamical range, and all regions of 
the input space receive coverage from neurons. Conse- 
quently, training of the network is accelerated. 

III. PERFORMANCE MEASURES 

The performance of the models we have been develop- 
ing is assessed in terms of several commonly used sta- 



9 



E 



10 



10° 



10 



K io 4 



10' 



10 



„Ni 




60 



65 



70 75 
MASS NUMBER 



80 



85 



FIG. 4: Plot showing calculated and experimental /3~-decay 
halflives for the 28Ni isotopic chain. Solid dots: experimen- 
tal data points. Unfilled dots: new and more precise ex- 
perimental halflives recently deduced by Hosmer et al. (45[. 
Pluses: results generated by the [2 — 5 — 5 — 5 — 5 — 1|111] 
ANN model with inputs (Z,N). Solid lines trace the calcu- 
lated values of the Overall Mode (learning, validation, and 
test sets), while dotted lines trace extrapolated values pro- 
duced by the model. 



tistical measures, namely, the Root Mean Square Error 
(crmse), the Mean Absolute Error (ctmae), and the Nor- 
malized Mean Square Error (ctnmse)- For any given data 
set, these quantities provide overall measures of the de- 
viation of the calculated values yi = log 10 TJg^aic of the 
log-halflife produced by the model for nuclide i, from 
the corresponding experimental value yi = log 10 Tp^ v . 
To understand the network's response in more detail, a 
Linear Regression Analysis (LR) is also carried out in 
which the correlation between experimental and calcu- 
lated halflife values is evaluated in terms of the correla- 
tion coefficient (R- value). Definitions of these quantities 
follow, with n standing for the total number of nuclides 
in each case (the full data set or one of its subsets - the 
learning, validation, or test set). 



Root Mean Square Error 



0RMSE 



1 " 



0=1 



1/2 



(19) 



Normalized Mean Square Error 

CT NMSE = — " — 2 • 

E„=i KVp - Vp) 



Mean Absolute Error 



n 



CTMAE = - \Vp ~ Vp\ 

P=l 



(20) 



(21) 



Those models having smaller values of ctrmse and omae, 
and onmse closer to unity, are favored. 



Linear Regression (LR) 

V P = ay p + b. 



(22) 



In linear regression, the slope a and the intercept b are 
calculated, as well as the correlation coefficient 



R 



Ep=i y p y p 



Ep=i (y p -(y p )) 2 e: =1 



Y, 



Y, 



1/2 ' 



(23) 

where Y p = y p — (y) and Y p = y p — (y). Values of R 
greater than 0.8 indicate strong correlations. 

The above indices necessarily provide only gross as- 
sessments of the quality of our models. In the literature 
on global modeling of /3~ halflives, several additional in- 
dices, perhaps more appropriate to the physical context, 
have been used to analyze perf ormance. The collabora- 
tion led by Klapdor pi El TEE EI EI EH has employed 
the quality measure 



x K 



1 - 

7, X^P' 



(24) 



P =i 



wherein 



_ J TJg^xp/I^.calc, if 7/3, oxp > Tp 

,calc 



T, 



/3.calc 



/2> J( 



if Ti 



/3,cxp 



/3,calc; 



(25) 



along with the corresponding standard deviation xk 

1/2 



1 

n ^ v 

P =i 



(26) 



Again the sums run over the appropriate set of nuclides. 
Perfect accuracy is attained when xk = 1 and o~k = 0. 

In a more incisive assessment, also pursued by Klap- 
dor and coworkers, one calculates the percentage m of nu- 
clides having measured ground-state halflife Tp^p within 
a prescribed range (e.g., not greater than 10 6 s, 60 s, or 1 
s), for which the halflife generated by the model is within 
a prescribed tolerance factor / (in particular, 2, 5, or 10) 
of the experimental value. 

A measure M similar to xk, but defined in terms of 
log 10 T ( g rather than T^, has been used by Mdller and 
collaborators [H, H(|; specifically, 



M 



1 ™ 

~y^ r P' 



(27) 



P =i 



where r p = y p /y p . This quantity gives the average posi- 
tion of the points in Fig. [5] for the respective data sets. 
Its associated standard deviation 

"I 1/2 



0~M 



n ^ 

L p=i 



(r p - MY 



(28) 



10 



is also examined, and the "total" error of the model for 
the data set in question is taken to be 



1 n 

P =i . 



1/2 



(29) 



which is the same as the ormse defined in Eq. (fl9|) . 
Model quality is also expressed in terms of exponenti- 
ated versions of these last three quantities, namely the 
mean deviation range 



M ( 10) = 10 
the mean fluctuation range 

CT M( 10 > = 

and total error range 

£(10) : 

£(10) = 1Q 1 



A I 



(30) 



(31) 



(32) 



Superior models should have £, M, and <jm near zero, 
and M^ w \ crj^°\ and £( 10 ) near unity. Again, in a closer 
analysis of model capabilities, these indices are evaluated 
within prescribed halflife domains. 



IV. RESULTS AND DISCUSSION 

As already indicated, statistical modeling of /3~-decay 
systematics is more effective when the range of lifetimes 
considered is more restricted. Accordingly, the follow- 
ing detailed presentation and analysis will focus on the 
properties and performance of the best ANN model de- 
veloped using the NuSet-B database, which is restricted 
to nuclides with j3~ halflife below 10 6 s. The quality of 
this model will be compared, in considerable detail, with 
that of traditional theoretical global models cited in the 
introduction, earlier ANN models, and models provided 
by another class of learning machines (Support Vector 
Machines, or SVMs). 

After a large number of computer experiments on 
networks developed with different architectures, in- 
put/output coding schemes, activation functions, initial- 
ization prescriptions, and training algorithms [471 ]. we 
have arrived at an ANN model well suited to approxi- 
mate reproduction of the observed (3~ -decay halflife sys- 
tematics and prediction of halfives of nuclides unfamil- 
iar to the network. The preferred network is of archi- 
tecture [3 — 5 — 5 — 5 — 5 — 1 1116], The hyperbolic tan- 
gent sigmoid is taken as the activation function of neu- 
rons in hidden layers, and a saturated linear function 
is adopted in the output layer. In training, the tech- 
niques for improving generalization that were described 
in Sec. II, namely, Bayesian regularization and cross- 
validation, were implemented in combination with the 
Levenberg-Marquardt optimization algorithm (LMOBP) 
and the Nguyen- Widrow initialization method. The net- 
work was taught in batch mode and the training phase 



was continued for 696 epochs. Of the 116 degrees of free- 
dom corresponding to the network weights and biases, 
98 survive the training process; this is the value of the 
number 7fc defined in Eq. (|14p . 



A. Comparison with Experiment 

In this subsection, we evaluate the performance of our 
ANN model by direct comparison with the available ex- 
perimental data. Table Q] collects results for the overall 
quality measures (fl"9 |) -([2"T j) commonly used in statistical 
analysis as well as the values of the correlation coeffi- 
cient R (See Eq. (f2"3"|) h We may quote for comparison the 
root-mean-square errors of 1.08 (learning mode) and 1.82 
(prediction mode) obtained in an earlier ANN model of 
beta-decay systematics [331 ] . 



TABLE I: Performance measures for the learning, valida- 
tion, test, and whole sets, achieved by the favored ANN 
model of /3~-decay halflives, a network with architecture 
[3 — 5 — 5 — 5 — 5 — 1 |116] trained on nuclides from NuSet- 
B. 



Performance 


Learning 


Validation 


Test 


Whole 


Measure 


Set 


Set 


Set 


Set 


ORMSE 


0.53 


0.60 


0.65 


0.57 


onmse 


1.004 


0.995 


1.012 


0.999 


ffMAE 


0.38 


0.41 


0.46 


0.40 


R-value 


0.964 


0.953 


0.947 


0.958 



These overall measures are silent with respect to spe- 
cific physical merits or shortcomings of the model. On 
the other hand, such information can be revealed by suit- 
able plots of the results from applications of the model, 
as exemplified in Figs. [5] [5] 

Figs. El and [5] present the ratios of calculated to experi- 
mental halflife values. The deviations from the measured 
values are clearly visible as departures from the solid line 
T^caic/T^cxp = 1. Both figures show that the model re- 
sponse follows the general trend of experimental halflives. 
The scattered points at higher halflife values imply that 
forbidden transitions are not adequately taken into ac- 
count by the model. On the other hand, shell effects 
are included in the right direction as shown in Figs. [S]- 
[8] The accuracy of model output versus distance from 
stability can be inferred from Fig. \7\ The local isotopic 
crmse (Fig. [8]) and the absolute deviation of calculated 
from experimental log 10 Tp values (Fig. [7]) indicate a bal- 
anced behavior of network response in all f3~ -decay re- 
gions. However, Fig. [7] shows that some less accurate 
results are obtained very near the /3-stability line,a fea- 
ture also present in the traditional models of Refs.[l5l[2fj|. 
For nuclei with very small or very large mass values there 
are no significant deviations. 

Finally, the regression analysis we have performed, in 
which linear fits are made for the learning, validation, 
and test sets as well as the full NuSet-B database, serves 
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FIG. 5: Ratios of calculated to experimental halflives values 
for nuclides in the learning (black), validation (gray), and 
test (white) sets selected from NuSet-B, plotted versus halflife 
T>,ox P . Total Error equals to E (10) (See Eq. l32)l . 
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FIG. 6: Same as Fig.0 but ratios of calculated to experimen- 
tal halflives are plotted against the atomic number Z. The 
dashed lines indicate the magic numbers. 



to demonstrate in a different way the slight discrepancies 
between calculated and observed /3~-decay halflives, as 
illustrated in Fig. [9j Moreover, the resultant R-values 
(See also Table [J) imply that the observed systematics 
is smoothly and uniformly mirrored in the model's re- 
sponses. 



TABLE II: Analysis of the deviation between cal- 
culated and experimental f3~ -decay halflives of the 
[3 — 5 — 5 — 5 — 5 — 1 1116] ANN model in the Overall 
and Prediction Modes, based on the quality measures 

M (io) 

and cr M (i ) of Eqs. (f50|l — (fBTj) used by Moller and coworkers. 
The second column denotes the even/odd character of the 
parent nucleus in Z and N, while n is the number of nuclides 
with experimental halflives lying in the prescribed range 
(first column). 



Tj3 ] cxp 






(a) ANN Model. 


Overall Mode. 


(8) 


Class 


/( 


M (io) 


o"m< 10 ) 


< 1 


0-0 


76 


1.04 


2.53 




odd 


125 


1.16 


2.25 




c-c 


51 


1.87 


2.45 


< 10 


o-o 


121 


1.11 


2.96 




odd 


187 


1.10 


2.31 




c-c 


87 


1.65 


2.56 


< 100 


o-o 


158 


1.08 


3.06 




odd 


261 


1.08 


2.45 




c-c 


110 


1.58 


2.31 


< 1000 


o-o 


191 


1.12 


3.06 




odd 


329 


1.07 


2.73 




c-c 


133 


1.63 


2.60 


< 10 6 


o-o 


238 


0.93 


3.87 




odd 


437 


0.97 


3.67 




e-e 


163 


1.25 


3.44 




7/3,exp 




(b) ANN Model. Prediction Mode. 


00 


Class 


n 


M (io) 


°"m( 10 ) 


< i 


o-o 


11 


0.86 


1.98 




odd 


32 


1.05 


2.40 




c-c 


7 


2.36 


3.26 


< 10 


o-o 


20 


0.86 


3.76 




odd 


42 


0.92 


2.61 




c-c 


17 


1.80 


2.58 


< 100 


o-o 


28 


0.76 


3.20 




odd 


57 


0.97 


2.91 




c-c 


21 


1.58 


2.98 


< 1000 


o-o 


35 


0.78 


3.13 




odd 


68 


0.84 


3.07 




c-c 


28 


1.49 


3.04 


< 10 6 


o-o 


46 


0.58 


4.71 




odd 


87 


0.86 


4.07 




c-c 


35 


1.14 


4.33 



B. Comparison with RPA and GT Global Models - 
A Detailed Analysis 

In this subsection, the performance of the favored net- 
work model of (5~ lifetime systematics is compared with 
that of prominent theory-thick global models. 



Adopting the quality measures (|27 |) ~([32 |l intro- 
duced by Moller and collaborators, we first com- 
pare the performance of our global ANN model 
[3 — 5 — 5 — 5 — 5 — 1|116] with the global microscopic 
models based on the proton-neutron quasiparticle 
random-phase approximation (pnQRPA), in particular, 
the NBCS+pnQRPA model of Homma et al. [l5[ and 
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FIG. 7: Absolute errors of the calculated to experimental beta-decay halflives of all nuclides (p) in the full NuSet-B database, 
plotted versus proton and neutron numbers Z and N for the [3 — 5 — 5 — 5 — 5 — 1 |116] network model. The bar on the right 
indicates the mapping from the absolute error values |e p | = |log 10 Tg exp — log 10 Tf to the gray scale. Test nuclides are 
indicated as squares. 



the FRDM+pnQRPA model of Moiler et al. [19]. The 
efficacy of the ANN model is also compared with that of 
the micro-statistical Semi-Gross Theory (SGT) as imple- 
mented by Nakata et al. 0. Table M lists the ANN values 
for Af ( 10 ) and a^" 1 specific to odd-odd, odd- A, and even- 













Hlf!l,tlllt..TtllT1 


TttTtttTtTTti 


TtTTTTtT-TttTtT 


1t.,1t1i,.1t!t!iLi ill, 


tTtTt -tttttTttIt .tl 



10 20 30 40 50 60 70 80 90 100 









1 ililmlLn.lJlJ .1 111.,!, 1 


!.lltnt!tT„MTLtT.T. Tt,t. I , l.lll .„ll, 


1 



10 20 30 40 50 60 70 80 90 100 











llllllll 


!l,1t. „tTL!ltTL.T.T».l!ltlltl Jllllllu 


, ,„],.! ,uii. 1 


t.tli .ill 



10 20 30 40 50 60 70 80 90 100 















ItIT^tTTtTTtTtt-tttTttttTt 


TTTtTTTTttttTI 


llllllll 


lllliilll, 


TtTTttTtTTttThTTI 


llll .tmtIItLIt! 



10 20 30 40 50 60 70 80 90 100 
MASS NUMBER 



FIG. 8: ctrmse values in each isotopic chain, for the nu- 
clides in the learning, validation, and test sets, and the full 
NuSet-B database, plotted against the mass number A, for 
the [3 — 5 — 5 — 5 — 5 — 1 |116] network model. 



even nuclides. Table HTT1 collects the M^ 10 ) and a^P val- 
ues for the three theory-thick models in the same format. 
As seen in these tables, both pnQRPA and SGT models 
tend to overestimate the (3~ halflives of odd-odd nuclei, 
while the FRDM calculation tends to underestimate the 
shorter halflives for even-even and odd mass nuclei. The 
ANN model, on the other hand, tends to overestimate 
the halflives of even-even nuclides, although to a smaller 
degree; this shortcoming is due, at least in part, to the 
relative scarcity of even-even parents. 

Table IIVI contains values of the performance measures 
defined in Eqs. (EI])-© for three global models of (3~- 
decay halflive. Here the entries are not separated accord- 
ing to even-even, odd-A, or odd-odd class membership 
of the nuclides involved. Included are results for cal- 
culations within the FRDM+pnQRPA model, updated 
to a more recent mass evaluation 20], together with 
corresponding values for a hybrid "micro-macroscopic" 
pnQRPA+jffGT treatment, which combines the QRPA 
model of allowed Gamow- Teller (3 decay with the Gross 
Theory of first-forbidden (ff) decay [2(| • In order to per- 
mit a direct comparison with the ANN model, we also 
report in this table the results for ANN performance fig- 
ures determined independently of the even-even, odd- 
A, odd-odd nuclidic class distinction, focusing atten- 
tion only on the subdivision into halflife ranges. The 
improved FRDM+pnQRPA model underestimates long 
halflives, whereas the pnQRPA+ff GT approach slightly 
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FIG. 9: Regression analysis for the learning, validation, test (prediction mode) and for the full database (overall mode). Solid 
lines represent the desirable relation: (log 10 Tg jCa i c = log 10 Tp l0Xp ), while dashed lines indicate the corresponding best linear 
fittings. The respective values of the parameters a and b of Eq. (|22[1 and the correlation coefficient R of (Eq. (|23|H are given in 
each panel. 



underestimates halflives over the full range considered. 
The tabulated quality indices indicate that the ANN re- 
sponses are in closer agreement with experiment more 
frequently than the FRDM+pnQRPA calculations, while 
the ANN model and the pnQRPA-f/f GT approaches per- 
form about equally well. 

The performance of our ANN model may also be 
evaluated in terms of the quality measures xk and 
<jk employed by Klapdor and coworkers and defined in 
Eqs. Table W\ includes values of these quan- 

tities for the network model, along with values for the 
pnQRPA calculation of Staudt et al. [H| and for the 
NBCS+prtQRPA approach of Homma et al. [H|. De- 
tailed comparison shows that, judging from these indices, 
there is only a modest decline in the quality of ANN 
responses in going from the Overall Mode to the Pre- 
diction Mode, and that the performance of the pnQRPA 
model is distinctly better than that of the neural network 



for shorter halflives but worse for longer halflife values. 
We note, however, that the pnQRPA model could be re- 
garded as over-parameterized compared to more up-to- 
date models, since the strengths of the NN interactions 
are derived from a local fitting of the experimental data 
in each chain. Turning to the NBCS+pnQRPA calcula- 
tion, it is evident from Table fVl that the ANN model gen- 
erally exhibits smaller discrepancies between calculated 
and observed /3~-decay halflives. For example, the net- 
work model has the ability to reproduce approximately 
50% of experimentally known halflives shorter than 10 6 
s within a factor of 2. It should be noted, however, that 
the NBCS+pnQRPA model has fewer adjustable param- 
eters [15l |. 

Viewed as a whole, the analyses presented in Tables HB 
[V] demonstrate that in a clear majority of cases in which 
the statistical model of 0~ halflives is presented with test 
nuclides absent from the training and validation sets, it 
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TABLE III: Same analysis as presented in Table [II] but in- 
stead assessing the quality of traditional theoretical models, 
corresponding specifically to (a) the NBCS+proQRPA calcu- 
lation of Homma et al. [TeJ, (bjthe FRDM+pnQRPA calcula- 
tion of Moller and coworkers [ly], and (c) the SGT calculation 
by Nakata et al. Q. Also, these assessments are limited to 
nuclides with experimental halflives below 1000 s. 



-*-0 , cxp 




(a) NBCS+pnQRPA Calculation [15]. 


00 


Class 








< 1 


o-o 


28 


1.75 


4.96 




odd 


31 


0.60 


2.24 




e-e 


10 


1.15 


2.36 


< 10 


o-o 


66 


1.89 


4.60 




odd 


81 


0.92 


3.84 




c-e 


34 


1.01 


2.93 


< 100 


0-0 


85 


3.15 


10.51 




odd 


127 


1.07 


4.29 




e-e 


52 


1.13 


3.58 


< 1000 


0-0 


93 


3.02 


10.25 




odd 


157 


1.10 


5.55 




c-e 


63 


1.39 


6.10 




, cxp 


(b) FRDM+pnQRPA Calculation [19]. 


00 


Class 


n 


M (io) 


°"m( 10 ) 


< 1 


o-o 


29 


0.59 


2.91 




odd 


35 


0.59 


2.64 




e-e 


10 


3.84 


3.08 


< 10 


0-0 


59 


0.76 


8.83 




odd 


85 


0.78 


4.81 




e-e 


34 


2.50 


4.13 


< 100 


o-o 


88 


2.33 


49.19 




odd 


133 


1.11 


9.45 




c-c 


54 


2.61 


4.75 


< 1000 


0-0 


115 


3.50 


72.02 




odd 


194 


2.77 


71.50 




e-e 


71 


6.86 


58.48 



7>, exp (c) SGT Calculation [8]. 



00 


Class 


n 


M (io) 


°"m< 10 ) 


< 1 


o-o 


38 


1.45 


2.57 




odd 


56 


1.75 


2.32 




c-c 


19 


2.03 


2.30 


< 10 


o-o 


83 


1.94 


4.10 




odd 


110 


1.71 


2.36 




c-c 


45 


1.58 


2.23 


< 100 


o-o 


115 


2.54 


8.86 




odd 


174 


1.95 


3.15 




e-e 


64 


1.45 


2.40 


< 1000 


o-o 


144 


3.42 


15.21 




odd 


232 


2.36 


5.42 




c-c 


85 


1.38 


2.81 



TABLE IV: . Comparison of values of quality indices char- 
acterizing the "theory-thin" neural-network model of the 
present work and two "theory-thick" models developed by 
Moller and coworkers: ANN model in Overall (a) and Pre- 
diction (b) Modes, and (c) FRDM+pnQRPA and (d) pn- 
QRPA+ifGT models of Ref. [U The number n of nuclides 
with experimental halflives below the prescribed limit is given 
in the second column. The quality indices labeling columns 



3-8 are 


defined 


in Eqs. 


(E3 - 










T^cxp 




(a) 


ANN Model. 


Overall Mode. 




to 


n 


M 


M (io) 




°"m( 10 > 


E 


E (io) 


< 1 


252 


0.09 


1.24 


0.39 


2.44 


0.40 


2.50 


< 10 


395 


0.08 


1.21 


0.42 


2.60 


0.42 


2.65 


< 100 


529 


0.07 


1.17 


0.43 


2.68 


0.43 


2.71 


< 1000 


653 


0.07 


1.18 


0.45 


2.84 


0.46 


2.88 


< 10 6 


838 


0.00 


1.01 


0.57 


3.70 


0.57 


3.70 




-^jS , exp 




(b) ANN Model. Prediction Mode. 




to 


n 


M 


M (io) 




°"m( 10 ) 


E 


E (io) 


< 1 


50 


0.05 


1.12 


0.41 


2.56 


0.41 


2.58 


< 10 


79 


0.02 


1.05 


0.48 


3.00 


0.48 


3.01 


< 100 


106 


0.00 


1.00 


0.49 


3.08 


0.49 


3.08 


< 1000 


131 


-0.03 


0.93 


0.50 


3.16 


0.50 


3.17 


< 10 6 


168 


-0.09 


0.82 


0.64 


4.38 


0.65 


4.44 




,cxp 




(c) FRDM+pnQRPA Calculation [20 


]• 


to 


n 


M 


M (io) 




°m( 10 ) 


E 


E (io) 


< 1 


184 


0.03 


1.06 


0.57 


3.72 


0.57 


3.73 


< 10 


306 


0.14 


1.38 


0.77 


5.87 


0.78 


6.04 


< 100 


431 


0.19 


1.55 


0.94 


8.81 


0.96 


9.21 


< 1000 


546 


0.34 


2.20 


1.28 


19.09 


1.33 


21.17 


< 10 6 

















T/j.exp (d) pnQRPA +ff GT Calculation [20]. 



to 


n 


M 


M (io) 


0"M 


°"m< 10 ) 


E 


E (10) 


< 1 


184 


-0.08 


0.84 


0.48 


3.04 


0.49 


3.08 


< 10 


306 


-0.03 


0.93 


0.55 


3.52 


0.55 


3.53 


< 100 


431 


-0.04 


0.91 


0.61 


4.10 


0.61 


4.12 


< 1000 


546 


-0.04 


0.92 


0.68 


4.81 


0.68 


4.82 


< 10 6 

















makes 'predictions that are closer to experiment than the 
corresponding results from traditional models based on 
quantum many-body theory and phenomenology. This 
is ascribed to some extend to the larger number of ad- 
justable parameters of the current model. 



C. Comparison with Prior ANN and SVM Models 

Some exploratory applications of artificial neural net- 
works to /3-decay systematics were carried out earlier by 
the Athens-Manchester-St. Louis collaboration and re- 
ported in Refs. [H, [U. The first of these studies ar- 
rived at a fully-connected multilayer feedforward ANN 
model having the simple architecture [16 — 10 — 1 1 181 ] , 
and the second dealt with a similar model with architec- 
ture [17 — 10 — 1 1 191]. Both of these efforts employed 
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TABLE V: Comparison of performance measures character- 
izing the ANN model of the present work, when operating in 
the Overall (a) and Prediction (b) Modes, with corresponding 
values for (c) the pnQRPA model of Staudt et al. [l3l ] and (d) 
the NBCS+pnQRPA model of Homma et al. []]|. The qual- 
ity indices m%, xk, and ok are defined by Eqs. (|24[l - (|26|l . 
The third column reports the percentage m% of nuclides hav- 
ing experimental halflives within the prescribed range (second 
column), for which the calculated halflife lies within a certain 
tolerance factor (first column) of the experimental value. 

(a) ANN Model: Overall Mode. 



factor 


Tf3, cxp (s) 


m% 


xk 


OK 


< 10 


< 10 b 


92.0 


2.46 


1.72 




< 60 


96.5 


2.21 


1.52 




< 1 


97.6 


2.10 


1.39 


< 5 


< 10 6 


82.8 


1.99 


0.95 




< 60 


90.2 


1.88 


0.84 




< 1 


93.7 


1.88 


0.80 


< 2 


< 10 6 


53.5 


1.41 


0.27 




< 60 


60.6 


1.41 


0.27 




< 1 


61.9 


1.41 


0.26 






(b) ANN Model: Prediction Mode. 




factor 


Tp, cxp (s) 


m% 


x K 


OK 


< 10 


< 10 b 


90.5 


2.69 


1.85 




< 60 


96.1 


2.48 


1.64 




< 1 


98.0 


2.24 


1.30 


< 5 


< 10 6 


79.2 


2.10 


0.97 




< 60 


87.3 


2.05 


0.91 




< 1 


94.0 


2.04 


0.89 


< 2 


< 10 6 


49.4 


1.48 


0.28 




< 60 


53.9 


1.48 


0.27 




< 1 


60.0 


1.50 


0.27 



(c) pnQRPA Calculation [13]. 
factor 7/3, cxp (s) m% xk ok 



< 10 


< 10 b 


72.2 


1.85 


1.21 




< 60 


96.3 


1.67 


1.02 




< 1 


99.1 


1.44 


0.40 


< 5 


< 10 6 


69.7 


1.68 


0.76 




< 60 


94.5 


1.56 


0.66 




< 1 


99.1 


1.44 


0.40 


< 2 


< 10 6 


56.4 


1.37 


0.29 




< 60 


82.2 


1.36 


0.29 




< 1 


90.6 


1.35 


0.27 






(d) NBCS+pnQRPA Calculation [15]. 




factor 


T/3,cxp (s) 


m% 


x K 


a 

OK 


< 10 


< 10 b 


76.7 


3.00 






< 60 


87.2 


2.81 






< 1 


95.7 


2.64 




< 5 


< 10 b 

< 60 










< 1 








< 2 


< 10 6 


33.8 


1.43 






< 60 


42.0 


1.41 






< 1 


50.7 


1.43 





a (JK results are not available in Ref. Il5l . 



TABLE VI: Performance measures for the [16 — 10 — 1 1181] 
ANN model constructed by Mavrommatis et al. [33|. The 
quality indices xk and ok, introduced by Klapdor and 
coworkers, are defined in Eqs. (|24[) and (|26[) , respectively, 
while m% is the percentage of nuclides having experimen- 
tal halflives within the prescribed range (second column), for 
which the calculated halflife lies within the tolerance factor 
(first column) of the experimental value. 

Prediction Mode. ANN model of Ref. 33. 



factor 


T,3,cxp ( s ) 


m% 


x K 


OK 


< 10 


< 10° 


82.8 


2.78 


1.83 




< 60 


88.1 


2.80 


1.83 




< 1 


90.0 


2.88 


1.88 


< 5 


< 10 b 


72.4 


2.22 


1.07 




< 60 


76.2 


2.20 


1.01 




< 1 


76.7 


2.23 


1.02 


< 2 


< 10 b 


39.7 


1.39 


0.29 




< 60 


42.9 


1.44 


0.32 




< 1 


43.3 


1.46 


0.32 



TABLE VII: Performance measures for the [17 — 10 — 1 1191] 
ANN model constructed by Clark et al. [34| . The quality 
indices M^ 10 ^ and o M (ia) , introduced by Moller and coworkers, 
are defined in Eqs. pJJl - lpTj) . 

IXexp Prediction Mode. ANN model of Ref. 34. 



00 


Class 


M (io) 


O M (10) 


< 1 


o-o 


2.05 


2.31 




odd 


1.08 


2.38 




e-e 


1.79 


2.71 


< 10 


o-o 


2.26 


5.42 




odd 


1.19 


2.44 




e-e 


1.31 


2.30 


< 100 


o-o 


1.76 


5.19 




odd 


1.12 


3.15 




e-e 


0.98 


2.67 


< 1000 


o-o 


2.22 


6.25 




odd 


1.22 


5.50 




c-e 


0.93 


4.78 



binary encoding of Z and N at the input, used the same 
data sets which differed from the ones of the present work 
and implemented a quite orthodox backpropagation algo- 
rithm, incorporating a momentum term to enhance con- 
vergence of the learning process [27| • The main difference 
between these two earlier ANN models is the addition, in 
the second, of an analog input unit representing the Q- 
value of the decay. Tables I VII and I VIII present values for 
performance measures of these ANN models operating 
in the Prediction Mode. (We concentrate on this aspect 
of performance, since it relates directly to the extrapa- 
bility of the models.) For the [16 — 10 — 1 1 181 ] network 
model, Table IVT1 displays results for the quality measures 
used by Klapdor and coworkers, evaluated on the test set. 
For the [17 —10—1 1191] model, Table IVTT1 gives results 
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TABLE VIII: Root-mean-square errors (ormse) for (a) the 
[3 — 5 — 5 — 5 — 5 — 1 1116] ANN model of the present work, 
and (b) the SVM model constructed by Li et al. [37]]. Here n 
is the number of nuclides in each of the data (sub)sets. 

(a) ANN Model. 
Learning Set Validation Set Test Set 



Class 


n 


ormse 


n 


ormse 


n 


fRMSE 


EE 


95 


0.52 


33 


0.52 


35 


0.64 


EO 


121 


0.55 


46 


0.77 


47 


0.57 


OE 


141 


0.46 


42 


0.53 


40 


0.66 


OO 


146 


0.56 


46 


0.52 


46 


0.71 


Total 


503 


0.53 


167 


0.58 


168 


0.65 



(b) SVMs Calculation. Li et al. [37] . 



Learning Set Validation Set Test Set 



Class 


/; 


ormse 


ii 


ormse 


n 


ormse 


EE 


131 


0.55 


16 


0.57 


16 


0.62 


EO 


179 


0.41 


22 


0.42 


22 


0.51 


OE 


172 


0.41 


21 


0.47 


21 


0.47 


OO 


190 


0.52 


24 


0.4 


24 


0.52 


Total 


672 


0.47 


83 


0.46 


83 


0.53 



for the performance measures of Moller and coworkers, 
based on the responses of the model to the same test set. 
Upon comparison with the entries for M^ 10 ) in Table HT1 
one sees that the performance of the 17- input network 
model is rather similar to that of the present 6-layer ANN 
model, except for odd-odd nuclides - whose lifetimes are 
overestimated by the older network. In the case of the 
16-input model, comparison of the entries for m% in Ta- 
bles |Vl] and [V] provides substantial evidence for the supe- 
riority of the new ANN model developed here, although 
this is not so clearly reflected in the respective xk values. 

From a strategic standpoint, the advantages of the 
current ANN model over the earlier ones are twofold. 
First, the number of degrees of freedom (weight and bias 
parameters) is reduced considerably by the use of ana- 
log encoding of Z and N . Despite the greater num- 
ber of hidden layers, the current model, with architec- 
ture [3 — 5 — 5 — 5 — 5 — 1 1 1 1 6 ] , has 65 parameters fewer 
than the 16-input model and 75 less than the 17-input 
model. Secondly, there is the advantage relative to the 
latter model that the current version does not rely on 
Q- value input. Experimental Q- values are not known for 
all the nuclides of interest, so the need to call upon the- 
oretical results for input variables is eliminated. 

As mentioned in the introduction, initial studies of the 
classification and regression problems presented by nu- 
clear systematics have recently been carried out [33, 
using the relatively new methodology of Support Vec- 
tor Machines (SVMs). SVMs, which belong to the class 
of kernel methods [271 ] . are learning systems having a 
rigorous basis in the statistical learning theory devel- 
oped by Vapnick and Chervonenkis [28[ (VC theory). 
There are similarities to multilayer feedforward neural 
networks, notably in architecture, but there are also im- 
portant differences having to do with the better control 



over the tradeoff between complexity and generalization 
ability within the SVM framework. Importantly, within 
this framework there is an automated process for deter- 
mining the explicit weights of the network in terms of 
a set of support vectors optimally distilled from among 
the training patterns [48j |. The few remaining parame- 
ters are embodied in the inner-product kernel that allows 
one to deal efficiently with the high-dimensional feature 
space appropriate to the problem to be solved. The SVM 
methodology was originally developed for classification 
problems, but has been extended to function approxima- 
tion (regression) [27] ]. 

The recent applications of SVMs to global model- 
ing of nuclear properties, including atomic masses, a 
decay chains of superheavy nuclei, ground-state spins 
and parities, and [3~ lifetimes, demonstrate considerable 
promise for this approach. As in the present work, cross- 
validation is performed, separating the full database into 
learning, validation, and test sets. In the existing stud- 
ies, the data for a given property is divided into four 
nonoverlapping subsets containing input-output pairs for 
even-even, even-odd, odd-even, and odd-odd classes of 
nuclides distinguished by the parity of Z and N . 

Table [Vnl] provides values of the conventional ctrmse 
performance measure (Tl9| . both for the SVM model of 
/3~-decay systematics constructed by Clark et al. [37| and 
for the present ANN model. The SVM model demon- 
strates better performance based on this comparison, 
with a few exceptions involving the even-even nuclides. 
However, this comparison is somewhat misleading, since 
a larger fraction of the data was used for training, leaving 
numerically smaller validation and test sets in the SVM 
construction. It must be noted in this regard that the 
subdivision of the nuclides into four (Z, N) parity classes 
requires four separate SVM approximation processes to 
be executed. This can lead to spurious fluctuations in 
the predictions of lifetimes for nuclides of isotopic and 
isotonic chains, as found in detailed inspection of the 
outputs of the SVM model. We should note further, 
however, that a subsequent SVM model of (3~ system- 
atics shows (trmse values significantly lower than those 
given in Table IvTlTl for the SVM model of Li et al. 



D. The Extrapability of the ANN Model 

It is of course desirable to have a model that repro- 
duces experimentally known j3~ halfiives of nuclei across 
the known nuclear landscape. One can certainly achieve 
that goal with a sufficiently complex model that involves 
a sufficient number of adjustable parameters. However, 
excess complexity generally implies poor predictive abil- 
ity, and especially poor extrapability - lack of the ability 
to extrapolate away from existing data. Accordingly, a 
much more important and challenging goal is to develop 
a global model, statistical or otherwise, with minimal 
complexity consistent with good generalization proper- 
ties. The extent to which this goal can be achieved with 
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FIG. 10: Experimental data and derived halflives from differ- 
ent models for the isotopic chain of 26Fc. 



FIG. 13: The same as in Fig. 1101 but for the isotopic chain of 
2sNi. 
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FIG. 11: The same as in Fig. 1101 but for the isotopic chain of 

47Ag. 



FIG. 14: The same as in Fig. 1101 but for the isotopic chain of 

48 Cd. 
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FIG. 12: The same as in Fig. 1101 but for the isotopic chain of FIG. 15: The same as in Fig. 1101 but for the isotopic chain of 
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machine-learning techniques for different nuclear proper- 
ties is yet to be decided. Of course, one can test the per- 
formance of a favored network model on outlying nuclei 
(outlying with respect to the valley of stability), nuclei 
that are unknown to the network, but have known val- 
ues for the property of interest. Adequate performance in 
such tests can provide some degree of confidence in pre- 
dictions made by the model for nearby nuclei that have 
not yet been reached by experiment. 

In this subsection, we present some specific evidence of 
the extrapability of the [3 - 5 - 5 - 5 - 5 - 1 |116] ANN 
model developed in the present work. Figs. 10-15 show 
the halflives estimated by the model for nuclides in the 
Fe, Ag, Sn, Ni, Cd, and Bi isotopic chains. Correspond- 
ing pnQRPA+jffGT estimates are included for a compar- 
ison. Also included are some results (labeled GT*) from 
calculations by Pfeiffer, Kratz, and Moller (4|| based on 
the early Gross Theory (GT) of Takahashi et al. Q, with 
updated mass values [I7|,l5f| (GT*). There is no unam- 
biguous criterion that can be used to gauge the perfor- 
mance of these models. Judging from the observed be- 
havior of the known nuclei, one can generally expect that 
the more neutron-rich an exotic isotope is, the shorter its 
halflife. This expected downward tendency is predicted 
by all the models. One also expects to see some even-odd 
stagger of the points for neighboring isotopes. The ANN 
model produces such behavior, but it is probably overes- 
timated. Similar behavior, though less pronounced, ap- 
pears in the results from continuum-Quasiparticlc-RPA 
(CQRPA) approache s [231 and in the results of other the- 
oretical calculations H ufj • 



E. The r-Process Path 

Predictions from the ANN model developed here, and 
improvements upon it, may prove to be useful for quan- 
titative studies involving r-process nucleosynthesis. The 
/3-halflives (Tp) and /3-delayed neutron emission proba- 
bilities (P n ) of those isotopes lying in the r-process path 
are the two key /3-decay parameters that bear upon the 
/3-strength function (Sp) p|. Accordingly, an approach 
having global applicability for accurate prediction of (3 
halflives is needed for detailed dynamical r-process cal- 
culations. Moreover, reliable beta-halflife calculations 
are of special interest for the r-ladder isotones N = 50, 
82, and 126 where solar abundances peak, since they 
determine the r-process time scale. In Figs. [TBHTS1 we 
plot the halflives of closed-neutron-shell nuclei in these 
significant r-process regions as predicted by our ANN 
model, in comparison with corresponding results from 
pnQKPA+ffGT and GT* calculations [2fl|. In particu- 
lar, it is interesting to compare the various estimates of 
the halflife of the doubly magic r-process nucleus 78 Ni 
(Z = 28, N = 50). The result given by the ANN model 
is consistent with the recent measurement by Hosmer et 
al. [45j. In Fig. \T§\ halflives of /3 _ -decaying nuclides that 
are found near or on a typical r-process path with neu- 
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FIG. 16: The same as in Fig. HOI but for the isotonic chain of 
N = 50. 
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FIG. 17: The same as in Fig. 1101 but for the isotonic chain of 
N = 82. 
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FIG. 18: The same as in Fig. 1101 but for the isotonic chain of 
N = 126. 



19 



I Exp. Data 
Q ANN 

] pnQRPA+ffGT 
] GT* 



J 



79Cu 80Zn 81Ga 83Ga 130Gd 131 In 133ln 134Sn 135Sb 138Tc 

FIG. 19: Halflives for /3~-decaying nuclides that are found 
near or on a typical r-process path with the neutron separa- 
tion energy lesser or equal to 3 MeV. 



tron separation energy below 3 MeV are compared with 
those from pnQRPA+^GT and GT* calculations [H. 
The results given by the ANN model are close to the 
experimental values. 



of masses [30, S, |36|) strongly suggests that significant 
further improvements on the current ANN model of (3~ 
systematics are possible, as more sophisticated training 
algorithms and machine-learning strategies are contin- 
uously being developed. Thus we plan further studies 
along the same lines with multilayer feedforward percep- 
trons, while also exploring the potential of Support Vec- 
tor Machines. 

It is to be stressed that this program can be no substi- 
tute for aggressive pursuit of traditional, "theory-thick" 
global modeling, which inevitably provides greater in- 
sight into the underlying physics responsible for values 
taken by the targeted nuclear properties. The statistical 
approach can best serve in complementary and support- 
ive roles. We point out that hybrid statistical-theoretical 
models show special promise, as demonstrated in Ref . l36l. 
In that recent work, a [4-6-6-6-1 |169] ANN is 
used to model the differences between measured mass- 
excess values and the theoretical values given by the 
finite-range droplet model (FRDM) of Ref. [l?], thereby 
enabling improved prediction of masses away from sta- 
bility. 

Finally, as this last remark exemplifies, the prospects 
for fruitful application of statistical, machine-learning 
methods extend to a wide range of nuclear properties 
beyond the systematics of /3-decay lifetimes. 



V. CONCLUSION AND PROSPECTS 

A statistical approach to the global modeling of nuclear 
properties has been proposed and implemented for treat- 
ment of the systematics of j3~ lifetimes of the ground 
states of nuclei that decay exclusively in this mode. 
Specifically, artificial neural networks (ANNs) of multi- 
layer feedforward architecture are taught to reproduce 
the experimentally measured lifetimes of nuclides from a 
chosen large data set. Training of the networks is carried 
out in such a way that their intrinsic generalization ca- 
pabilities can be exploited to predict lifetimes of nuclides 
outside the data set used for learning. 

We have been able to develop an ANN model of this 
kind that demonstrates very good properties in terms 
of both the standard performance measures used in sta- 
tistical analysis and more problem-specific quality mea- 
sures that have been introduced to assess traditional the- 
oretical models for calculating (3~ lifetimes on a global 
scale. In a purely results-oriented sense (accurate fitting 
of given data and good prediction for nuclei not involved 
in the fitting process), the performance of this model 
matches or surpasses that of traditional models based on 
nuclear theory and phenomenology. This success opens 
the prospect that statistical modeling based on machine 
learning can provide a valuable tool in the exploration of 
f3~ halflives of newly created nuclei beyond the valley of 
stability. 

Experience gained previously with neural-network 
modeling of nuclear systematics (especially the modeling 
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